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Abstract 

Substantial improvement in accuracy of identified linear time-invariant single-input multi-output (SIMO) dynamical models is 
possible when the disturbances affecting the output measurements are spatially correlated. Using an orthogonal representation 
for the modules composing the SIMO structure, in this paper we show that the variance of a parameter estimate of a module 
is dependent on the model structure of the other modules, and the correlation structure of the disturbances. In addition, we 
quantify the variance-error for the parameter estimates for finite model orders, where the effect of noise correlation structure, 
model structure and signal spectra are visible. From these results, we derive the noise correlation structure under which the 
mentioned model parameterization gives the lowest variance, when one module is identified using less parameters than the 
other modules. 
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1 Introduction 


gained popularity, see e.g., Van den Hof et al .| (1 

20131; 

Dankers et al. 

(|2013&|a 2014); Ali et al. ( 

2011); 

Mat- 

erassi et al. (|2011|);|Torres et al. J2014); Ha 

3er anc 

Ver- 

haegen 

«2014|; Ounes et al. (j20l4); Chiuso and Pit- 

lonetto 

(2012). In this framework, signais are modeled 


To estimate a transfer function in the network, a large 
number of methods have been proposed. Some have been 
shown to give consistent estimates, provided that a cer¬ 
tain subset of signals is includ ed in the identification pro¬ 
cess ( |Dankers et al. 20136|« I. In these methods, the user 
has the freedom to include additional signals. However, 
little is known on how these signals should be chosen, 
and how large the potential is for variance reduction. 
From a theoretical point of view there are only a few re- 
sult s rega r ding specific network structures e.g.,|Hagg e1\ 
al. (2011 b ; Ramazi et al. (2014); Wahlberg et al. (2009). 
To get a better understanding of the potential of adding 
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available information to the identification process, we 
ask the fundamental questions: will, and how much, an 
added sensor improve the accuracy of an estimate of a 
certain target transfer function in the network? We shall 
attempt to give some answers by focusing on a special 
case of dynamic networks, namely single-input multi¬ 
output (SIMO) systems, and in a wider context, dynamic 
networks. 


SIMO models are interesting in themselves. They find 
applications in various discipline s, such as signal pro ¬ 
cessing and speech enhance ment (Benesty et al. 2005), 
(Docl o and Moonen| 2002 1, c ommunications (Bertaux 
et al |1999| l7 ^ichmidtH1986|), flT rudnows ki et all|1998| l, 
biomedical scienc es QMcCombie et al. " 2005p and struc¬ 
tural engineering ( Ulusoy et al. 2011). Some of these ap¬ 
plications are concerned witn spatio-temporal models, 
in the sense that the measured output can be strictl y re¬ 
lated to the loc at ion at which the sen so r is placed (Sto- 
1994), (Viberg et al. 1997), (Viberg and Ot- 


ica et al. 


tersten 


__ 99ip . In these cases, it is reasonable to expect 

that measurements collected at locations close to each 
other are affected by disturbances of the same nature. In 
other words, noise on the outputs can be correlated; un¬ 
derstanding how this noise correlation affects the accu¬ 
racy of the estimated model is a key issue in data-driven 
modeling of SIMO systems. 


For SIMO systems, our aim in this contribution is to 
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quantify the model error induced by stochastic distur¬ 
bances and noise in prediction error identification. We 
consider the situation where the true system can accu¬ 
rately be described by the model, i.e., the true system 
lies within the set of models used, and thus the bias 
(systematic) error is zero. Then, the model error mainly 
consists of the variance-error, which is caused by distur¬ 
bances and noise when the model is estimated using a 
finite number of input-output samples. In particular, we 
shall quantify the variance error in terms of the noise 
covariance matrix, input spectrum and model structure. 
These quantities are also crucial in answering the ques¬ 
tions we have posed above, namely, when and how much, 
adding a sensor pays off in terms of accuracy of the es¬ 
timated target transfer function. 


There are expressions for the model error, in terms of 
the asymptotic (in sample size) (co-) variance of the 
estimated parameters, for a vari ety of identi fi cation 
methods for multi -output systems (|Ljung 1999),(Ljung 


and Caines 
respond to t 


1980). Even though these expressions cor- 
re Cramer-Rao lower bound, they are typi¬ 


cally rather opaque, in that it is difficult to discern how 
the model accuracy is influenced by the aforementioned 
quantities. There is a well-known expression for the vari¬ 
ance of an estimated frequency response function that 
lends itself to the kind of analysis we wish to facilitate 


(Ljung 1985 Yuan and Ljung 1984 Zhu 19891. This 


formula is given in its SLVIO version by (see e.g., Zhu 

( 2001 )) 


1.1 Contribution of this paper 

As a motivation, let us first introduce the following two 
output example. Consider the model: 


2/i (t) = 0i,i u(t - 1) + ei (t), 
U 2 (t) = 02,2 u(t — 2) + e 2 (t), 


where the input u(t) is white noise and ek,k = 1,2 is 
measurement noise. We consider two different types of 
measurement noise (uncorrelated with the input). In the 
first case, the noise is perfectly correlated, let us for sim¬ 
plicity assume that ei(t) = e 2 (t). For the second case, 
ei (t) and e 2 (t) are independent. It turns out that in the 
first case we can perfectly recover the parameters 0i.i 
and 02 . 2 , while, in the second case we do not improve the 
accuracy of the estimate of 9\ -| by also using the mea¬ 
surement y 2 {t). The reason for this difference is that, in 
the first case, we can construct the noise free equation 

2/i(£) - 2/2 (t) = 0i,i u(t - 1) - 0 2 ,2u(t - 2) 

and we can perfectly recover 9i t i and 02 , 2 , while in the 
second case neither y 2 (t) nor e 2 (t) contain information 
about ei(t). 

Also the model structure plays an important role for the 
benefit of the second sensor. To this end, we consider 
a third case, where again ei(f) = e 2 (t). This time, the 
model structure is slightly different: 


CovG(e^) * 


( 1 ) 


where <P U is the input spectrum and <P V is the spectrum 
of the noise affecting the outputs. Notice that the ex¬ 
pression is valid for large number of samples N and 
large model order n. For finite model order there are 
mainly results for SISO models jHjalmarsson and Nin- 


2006; |Ninness and Hjalmarssonj |2004[ |2005a|6) 
icently, multi-input-singie-output (MISO) models. 


ness 

and recently, 

For MISO models, the concept of connectedness (|Gev 


ers et al. 2006) gives conditions on when one input can 
help reduce the variance of an ide ntified transfer func - 
tion. These results were refined in Martensson (2007). 
For white inputs, it was recently shown in |Ramazi e/| 
al. (2014) that an increment in the model order of one 
transfer function leads to an increment in the variance of 
another estimated transfer function only up to a point, 
after which the variance levels off. It was also quan¬ 
tified how correlation between the inputs may reduce 
the accuracy. The r esults presente d here are similar 

while they 


(2014) 


in nature to those in Ramazi et al. 
regard another special type of multi-variable models 
namely multi-input single-output (MISO) models. Note 
that variance expressions for t he special case of SIMP 
cascade structures a re found in Wahlberg et al. (2009), 
Everitt et al. (20131. 


2/i (£) = 6i,iu{t - 1) + ei(t), 

2/2 (£) = 0 2 ,iu(t - 1) + e 2 (t). 

In this case, we can construct the noise free equation 

yi{t) - 2/2 (t) = (0i,i - 0 2>2 )u{t - 1). 

The fundamental difference is that now only the differ¬ 
ence (0i,i — 02 ,i) can be recovered exactly, but not the 
parameters 0i,i and 0 2 ,i themselves. They can be identi¬ 
fied from i/I (t) and y 2 {t) separately, as long as yi and y 2 
are m easure d. A similar consideration is made in | Ljung] 
et al. (2011), where SIMO cascade systems are consid¬ 
ered. 

This paper will generalize these observations in the fol¬ 
lowing contributions: 

• We provide novel expressions for the variance-error 
of an estimated frequency response function of a 
SIMO linear model in orthonormal basis form, or 


equivalently, in fixed denominator form (Ninness et 


al. 19991. The expression reveals how the noise cor¬ 
relation structure, model orders and input variance 
affect the variance-error of the estimated frequency 
response function. 
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• For a non-white input spectrum, we show where in 
the frequency spectrum the benefit of the correla¬ 
tion structure is focused. 

• When one module, i.e., the relationship between the 
input and one of the outputs, is identified using less 
parameters, we derive the noise correlation struc¬ 
ture under which the mentioned model parameter¬ 
ization gives the lowest total variance. 

The paper is organized as follows: in Section 2 we define 
the SIMO model structure under study and provide an 
expression for the covariance matrix of the parameter 
estimates. Section 3 contains the main results, namely 
a novel variance expression for LTI SIMO orthonormal 
basis function models. The connection with MISO mod¬ 
els is explored in Section 4. In Section 5, the main results 
are applied to a non-white input spectrum. In section 6 
we derive the correlation structure that gives the mini¬ 
mum total variance, when one block has less parameters 
than the other blocks. Numerical experiments illustrat¬ 
ing the application of the derived results are presented in 
Section 7. A final discussion ends the paper in Section 8 . 


2 Problem Statement 



Fig. 1. Block scheme of the linear SIMO system. 

We consider linear time-invariant dynamic systems with 
one input and m outputs (see Fig. 1). The model is de¬ 
scribed as follows: 


2 /i {*) 




ei (t) 

2/2 (t) 

= 

GM 

u(t.) + 

e 2 (t) 



G m (q)_ 


Tm{t)_ 


where q denotes the forward shift operator, i.e., qu(t) = 
u(t + 1) and the Gi(q ) are causal stable rational transfer 
functions. The Gi are modeled as 


where n\< ...< n m and Ti(q) = |Bi B ni {q) 
for some orthonormal basis functions {£>fc(<z)}fc=i- Or¬ 
thonormal with respect to the scalar product defined for 
complex functions f{z),g{z) : C —> C lxrra as (/, g) := 
^ flj{e lul )g*(e lul ) dw. Let us introduce the vector no¬ 
tation 



1 

_1 


ei(t) 

y(t) : = 

2/2 (t) 

, eft) := 

e 2 {t) 


1 

s 

_1 


1 

-to 

s 

_ 1 


The noise sequence |e(f)} is zero mean and temporally 
white, but may be correlated in the spatial domain: 


E[e(t)]=0 

E [eit)e(s) 1 ] = 6 t - s A (4) 

for some positive definite matrix covariance matrix A, 
and where E [•] is the expected value operator. We ex¬ 
press A in terms of its Cholesky factorization 

A = Ach^chi (5) 


where Aqh G R mxm is lower triangular, i.e., 


Ach 


7n 0 
721 722 


0 

0 

0 


O 'ml 0Vn2 • • • 7mm, 


( 6 ) 


for some { 7 jj}. Also notice that since A > 0, 


A' 1 = A- c t h A- c ^ h . 


(7) 


We summarize the assumptions on input, noise and 
model as follows: 


Assumption 1 The input {w(f)} is zero mean station¬ 
ary white noise with finite moments of all orders, and 
variance a 2 > 0. The noise {e(t)} is zero mean and tem¬ 
porally white, i.e, (4) holds with A > 0, a positive def¬ 
inite matrix. It is assumed that E [|e(f)| 4+p ] < 00 for 
some p > 0. The data is generated in open loop, that is, 
the input {■u(f)} is independent of the noise {e(f)}. The 
true input-output behavior of the data generating system 
can be captured by our model, i. e the true system can be 
described by (2) and (3) for some parameters 9° € R rai , 
i = 1,..., m, where n± < ... < n m . The orthonormal 
basis functions {Bkfq)} are assumed stable. □ 


Gi(q, 9i) 


BiiqWi, 


0i € i = 1,..., m, (3) 


Remark 2 Generalization of the model structure: 


3 






The assumption that the modules have the same or¬ 
thogonal parameterization in (3) is less restrictive 
than it might appear at first glance, and made for 
clarity and ease of presentation. A model consist¬ 
ing of non-orthonormal basis functions can be trans¬ 
formed into this format by a linear transformation, 
which can be computed b y the Gr am-Schmidt proce¬ 


dure | Trefethen and Bau 1991). Notice that fixed 


denominator models, with the same denominator in 


all transfer functions, also fit this framework (Nin- 


ness et al. 1999). However, it is essential for our 
analysis that the modules share poles. 

It is not necessary to restrict the input to be 
white noise. A colored input introduces a weight¬ 
ing by its spectrum <P U , which means that the 
basis functions need to be orthogonal with re¬ 
spect to the inner product {f,g)$ '■= (f^u^g). 

If <P u (z) = a 1 R(z)R*(z), where R{z) is a monic 
stable minimum phase spectral factor; the trans¬ 
formation Ti(q) = a~ 1 R(q)~ 1 Ti(q) is a procedure 
that gives a set of orthogonal basis functions in the 
weighted space. If we use this parameterization, all 
the main results of the paper carry over naturally. 
However, in general, the new parameterization does 
not contain the same set of models as the original 
parameterization. Another way is to use the Gram- 
Schmidt method, which maintains the same model 
set and the main results are still valid. If we would 
like to keep the original parameterization, we may, 
in some cases, proceed as in Section 5 where the 
input signal is generated by an AR-spectrum. 

Note that the assumption that n\ < ... < n m is not 
restrictive as it only represents an ordering of the 
modules in the system. 


2.1 Weighted least-squares estimate 


By introducing 9 = 


1 T 


Of ,...,91 


E ra 

i= 1 ' 


and the n x m transfer function matrix 


<T(q) = 


A 0 0 

0 0 

o o r m 


we can write the model ( 2 ) as a linear regression model 
y(t) = ip T (t)9+ e(t), ( 8 ) 

where 

<p T (t) = 'P{q) T u(t). 

An unbiased and consistent estimate of the parameter 
vector 6 can be obtained from weighted least-squ ares, 
with optimal weighting matrix A -1 (see, e.g., |Ljung 


(1999); [Soderstrom and Stoica (19891). A is assumed 
known, however, this assumption is not restrictive since 
A can be estimated from data. The estimate of 9 is given 
by 


N 


N 


On = ['^‘PitfA V T 0) 1 y{t). (9) 

\i=i / t=l 

Inserting ( 8 ) in (9) gives 

/ N \~ X N 

9 N = 0 + ( <p(t)A~ V T (t) ] <p(t)A~ 1 e(t). 


U= 1 


t —1 


Under Assumption 1, the noise sequence is zero mean, 
hence On is unbiased. It can be noted that this is the 
same estimate as the one obtained by the prediction er¬ 
ror method and, if the no ise is Gauss ian, by the max¬ 
imum likelihood method (Ljung 1999). It also follows 


that the asymptotic covariance matrix of the parameter 
estimates is given by 


AsCov (A = (E \ip(t)A 1 ip T (t)]) . 


( 10 ) 


Here AsCov On is the asymptotic covariance matrix 
of the parameter estimates, in the sense that the 
asymptotic covariance matrix of a stochastic sequence 
{/w}^=i> /jv S C lxq is defined at0 

AsCov f N := lim N- E [(f N - E [f N ])* (/at - E [/at])]. 
N—>oo 

In the problem we consider, using Parseval’s formula 
and (7), the asymptotic covariance matrix, (10), can be 
written a£l| 

AsCov On = [ 7 ^ / (e ju ) du 

. J ~ 7T 

= (W\ (11) 


where 


a?) = -*mc T H - (i2) 

Note that H?{q) is block upper triangular since T{q) is 
block diagonal and Af,^ is upper triangular. 


This definition is slightly non-standard in that the second 
term is usually conjugated. For the standard definition, in 
general, all results have to be transposed, however, all results 
in this paper are symmetric. 

2 Non-singularity of (A <T) usually requ ires paramete r iden- 
tifiability and persistence of excitation (Ljung 19991. 
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2.2 The introductory example 

With formal assumptions in place, we now consider the 
introductory example in greater detail. Consider the 
model 


?/i (f) = 0i,i q 1 u(t)+e 1 {t), (13) 

2/2 {t) = + e 2 aq~ 2 u{t) + e 2 (t) (14) 


which uses the delays q _1 and q~ 2 as orthonormal basis 
functions. With 9 = [ 6 * 1,1 02 ,l $ 2 , 2 ] T ; the corresponding 
regression matrix is 


u(t — 1 ) 0 0 

0 u(t — 1 ) u(t — 2 ) 


ip(t) T = 

The noise vector is generated by 


d(6) 


1 0 

Wi(t) 

e 2 (6)_ 


- 1 

< 0 . 

CN 

<33. 

r-H 

> 

1 

_w 2 {t)_ 


. (15) 


where wi(t) and w 2 (t) are uncorrelated white processes 
with unit variance. The parameter /? £ [0, 1] tunes the 
correlation between ei(t) and e 2 {t). When 3 = 0, the two 
processes are perfectly correlated (i.e., identical); con¬ 
versely, when 3 = 1, they are completely uncorrelated. 
Note that, for every 3 € [0: 1], one has E [ei(t) 2 ] = 
E [e 2 (£) 2 ] = 1. In fact, the covariance matrix of eft) be¬ 
comes 


A = LL 1 = 


1 


yfY^P 2 1 

Then, when computing (11) in this specific case gives 


AsCov = — 
a z 


1 yfl^W 0 




1 0 

o 3 2 


We note that: 


AsCov 


' 1,1 <' 2,1 


1 T 1 , 

= W 71 ’ 

(7 Z 
1 


AsCov 9 2 2 = —3 ■ 


cr* 


(16) 


correlation of the noise processes, i.e., they are in¬ 
dependent of the value of /?. However, note that the 
cross correlation between 6 * 1,1 and 6 * 2,1 in (16): 


Var ( 01 , 1 -a/I - 3 2 9 2 , 1 ) 


T—1 

T 

Id 

T—1 

[-y/l- 3 2 \ 

q 

tc 

[-y/l-pj 



(17) 


This cross correlation will induce a cross correlation 
in the transfer function estimates as well. 

( 2 ) As seen in (16), the variance of 6 * 1,2 strongly de¬ 
pends on 3- In particular, when 3 tends to 0 , one 
is ideally able to estimate 6* 12 perfectly. Note that 
in the limit case 3 = 0 one has ei(£) = e 2 {t ), so 
that ( 2 ) can be rearranged to obtain the noise-free 
equation 


2/1 (*) - 2 / 2 ( 6 ) = ( 6 * 1,1 ^ 6 * 2,1 )u(t - 1) + 6 *i, 2 m(6 - 2), 

which shows that both 6 * 1,2 and the difference 9 1,1 — 
02 ,i can be estimated perfectly. This can of course 
also be seen from (16), cf. (17). 


The example shows that correlated measurements can 
be favorable for estimating for estimating certain pa¬ 
rameters, but not necessarily for all. The main focus of 
this article is to generalize these observations to arbi¬ 
trary basis functions, number of systems and number of 
estimated parameters. Additionally, the results are used 
to derive the optimal correlation structure of the noise. 
But first, we need some technical preliminaries. 


2.3 Review of the geometry of the asymptotic variance 


The following lemma is instrumental in deriving the re¬ 
sults that follow. 



spanned by the rows ofW and {Bf}^ =1 ,r < n be an or¬ 
thonormal basis for S&. Suppose that J'(9°) £ C nX9 
the gradient of J with respect to 9 and J'(9° ) = d , {z 0 )L 
for some Zq £ C and L £ <C mxq . Then 


AsCov J(0 N ) = L*J2 Bi{z 0 yBi{z 0 ) L. (18) 
fe =1 


The above expressions reveals two interesting facts: 

(1) The (scalar) variances of 6 * 1,1 and 02 . 1 , namely the 
estimates of parameters of the two modules related 
to the same time lag, are not affected by possible 


2-4 Non-estimable part of the noise 

As seen in the example of Section 2.2, strong noise cor¬ 
relation may be helpful in the estimation. In fact, the 
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variance error will depend on the non-estimable part of 
the noise, i.e., the part that cannot be linearly estimated 
from other noise sources. To be more specific, define the 
signal vector ej\i(t) to include the noise sources from 
module 1 to module j, with the one from module i ex¬ 
cluded, i.e., 

e j\i(t) - = 



Now, the linear minimum variance estimate of ei given 
£j\i(t), is given by 

•= (19) 

Introduce the notation 

A i\j ■= Var [ei(t) - e Aj (t)\, (20) 

with the convention that Aqo := A The vector in 
(19) is given by 

Qij = [Cove jVi (f)] _1 E [e Ai (t)ei{t)\ . 

We call 

G(^) &i\j if) 

the non-estimable part of e.i(f) given e^it). 

Definition 4 When eju(f) does not depend on ek(t), 
where 1 < k < j, k i, we say that e.i(t ) is orthogonal 
to e/j(f) conditionally to e^ft). 

The variance of the non-estimable part of the noise is 
closely related to the Cholesky factor of the covariance 
matrix A. We have the following lemma. 

Lemma 5 Let e(f) £ R m have zero mean and covari¬ 
ance matrix A > 0. Let Ach he the lower triangular 
Cholesky factor of A, i.e., Aqh satisfies (5), with {7 
as its entries as defined by ( 6 ). Then for j < i, 

i 

a»ij = yy Aik- 

k=j +1 

Furthermore, 7 ij = 0 is equivalent to that ei{t) is orthog¬ 
onal to ej(t ) conditionally to ej\i(t). 

PROOF. See Appendix A. ■ 


Similar to the above, for i < m, we also define 

r 1 T 

ei:m(t) '■= 6i{t) ... e m (t) , 

and for j < i we define e l . rn \j (t) as the linear minimum 
variance estimate of ei- m (t) based on the other signals 
ej\i(t), and 

- = Cov [e ? ; :m (t) (f')\- 

As a small example of why this formulation is useful, 
consider the covariance matrix below, where there is cor¬ 
relation between any pair (ei(t),ej(t)): 

1 0.6 0.9] f" 1 0 0 1 ll 0.6 0.9 

A = 0.6 1 0.54 = 0.6 0.8 0 0 0.8 0 . 

0.9 0.54 1 0.9 0 0.44 0 0 0.44 

v j 

Ach 

From the Cholesky factorization above we see that, since 
732 is zero, Lemma 5 gives that 63 (f) is orthogonal to 
62 (f) given e 2 \ 3 (f), i.e., there is no information about 
63 (f) in 62 (f) if we already know ei(f). This is not ap¬ 
parent from A where every entry is non-zero. If we know 
ei(f) a considerable part of 62 (f) and 63 (f) can be esti¬ 
mated. Without knowing ei(f), Ai = A 2 = A 3 = 1, while 
if we know ei(f), A 2 |i = 0.64 and A 3 M = 0.19. 

3 Main results 

In this section, we present novel expressions for the 
variance-error of an estimated frequency response func¬ 
tion. The expression reveals how the noise correlation 
structure, model orders and input variance affect the 
variance-error of the estimated frequency response func¬ 
tion. We will analyze the effect of the correlation struc¬ 
ture of the noise on the transfer function estimates. To 
this end, collect all in transfer functions into 

G := Gi G 2 ■ ■ ■ G m ■ 

For convenience, we will simplify notation according to 
the following definition: 

Definition 6 The asymptotic covariance of G(e JU, °) := 
G{e ^°, 6 N ) for the fixed frequency ujq is denoted by 

AsCov G. 

In particular, the variance of Gj(e JW °) := Gj(e J “° ,6f) 
for the fixed frequency ujq is denoted by 

AsVar G*. 


6 



Define Xk as the index of the first system that contains the 
basis function Bk(e JU10 ). Notice that \k — 1 is the number 
of systems that do not contain the basis function. 


Theorem 7 Let Assumption 1 hold. Suppose that the 
parameters 0.j £ i = 1 ,m, are estimated using 
weighted least-squares (9). Let the entries of 9 be arranged 
as follows: 

9 = [01,1 ■ • ■ 0m,1 01,2 ■ • ■ 0m,2 ■ ■ • 01,m • • ■ 

■ • • 0m,ni 02,ni + l ■ ■ • 0m,ni + l • • ■ 0m,n m ] - (21) 


and the corresponding weighted least-squares estimate be 
denoted by 6. Then, the covariance of 9 is 

AsCov 0 —— diag(vli. m , A X2 . m | X2 _i,..., 

' • ■ > y ^Xra m : ”»IXn m — l)* (22) 

In particular, the covariance of the parameters related to 
basis function number k is given by 

AsCov 9k = ~^ A xk-m\xk-u (23) 


where 


9k = 


i T 


®Xk,k * * • 0m,A; 


and where, for Xk < i < m, 


AsVar0 Jjfc = 


(24) 


It also holds that 

rim 

AsCov G = 


fc =i 


°Xfc —i 


0 


0 AsCov 9k 


\B k (ei“°)\ 2 , (25) 


where AsCov 9k is given by (23) and 0 Xfc _i is a Xk ~ 
1 x Xk — 1 matrix with all entries equal to zero. For 
Xk = 1, 0 Xfc _i is an empty matrix. In (25), 0 denotes 
zero matrices of dimensions compatible to the diagonal 
blocks. 


PROOF. See Appendix B. ■ 

Remark 8 The covariance of 9k, which contain the pa¬ 
rameters related to basis function k, is determined by 
which other models share the basis function Bk,' cf. (17) 
of the introductory example. The asymptotic covariance 
ofG can be understood as a sum of the contributions from 
each of the n m basis functions. The covariance contribu¬ 
tion from a basis function Bk is weighted by |Bfe(e : '“ cl )| 2 
and only affects the covariance between systems that con¬ 
tain that basis function, as visualized in Figure 2. 



AsCov 0i \BAe K 



AsCov 6k |23fc(e“)| 2 


AsCov 9 nm \B nm (e™)\ 2 


Fig. 2. A graphical representation of AsCov G where each 
term of the sum in (25) is represented by a layer. A basis 
function only affects the covariance between modules that 
also contain that basis function. Thus, the first basis function 
affects the complete covariance matrix while the last basis 
function n m only affects modules Xn m , • • ■, m. 

Remark 9 The orthogonal basis functions correspond 
to a decomposition of the output signals into orthogonal 
components and the problem in a sense becomes decou¬ 
pled. As an example, consider the system described by 

2 /i (i) = 0 i,iSi (g)u(i) + ei (t), 

2/2 (t) = 9 2t iBi(q)u(t) + e 2 (t), 

2/3 (t) = 03 ,iSi(g)u(t) + 9 3:2 B 2 {q)u(t) + e 3 (t), (26) 

Suppose that we are interested in estimating 9 3 2 . For this 
parameter, (24) becomes 


AsVar 0 3 2 = (27) 

To understand the mechanisms behind this expression, 
let ui(t) = Bi(q)u(t), and u 2 (t) = B 2 (q)u{t) so that 
the system can be visualized as in Figure 3, i.e., we can 
consider u± and u 2 as separate inputs. 

First we observe that it is only y 3 that contains informa¬ 
tion about 0 3 i 2 , and the term 0 3 ,iUi contributing to y 3 is 
a nuisance from the perspective of estimating 0 3 , 2 . This 
term vanishes when u 3 = 0 and we will not be able to 
achieve better accuracy than the optimal estimate of9 3 ^ 2 
for this idealized case. So let us study this setting first. 
Straightforward application of the least-squares method, 
using u 2 and y 3 , gives an estimate of 0 3 ,2 with variance 
X/cr 2 , which is larger than (27) when e 3 depends on e 3 
and e 2 . However, in this idealized case, y± = e\ and 
y 2 = e 2 , and these signals can thus be used to estimate 
e 3 . This estimate can then be subtracted from y 3 before 
the least-squares method is applied. The remaining noise 
iny 3 will have variance A 3 | 2 , ife 3 is optimally estimated 
(see (19)-(20 )), and hence the least-squares estimate will 
now have variance \ 3 \ 2 /cr 2 , i.e., the same as (27). 
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To understand why it is possible to achieve the same ac¬ 
curacy as this idealized case when ui is non-zero, we 
need to observe that our new inputs ufft) and U 2 (t) are 
orthogonal (uncorrelated \ 3 [ Returning to the case when 
only the output y 3 is used for estimating 9 3 ^, this implies 
that we pay no price for including the term 9 33 ui in our 
model, and then estimating 9 3 ^ and 0 3 ^ jointly, i.e., the 
variance of 9 3,2 will still be A/cr^j The question now is 
if we can use y 1 and yi as before to estimate e 3 . Perhaps 
surprisingly, we can use the same estimate as when u\ 
was zero. The reader may object that this estimate will 
now, in addition to the previous optimal estimate of e 3 , 
contain a term which is a multiple of u\. However, due 
to the orthogonality between u± and U 2 , this term will 
only affect the estimate of 9 3 \ (which we anyway were 
not interested in, in this example), and the accuracy of 
the estimate of 9 3 ^ will be \ 3 \ 2 /<J 2 , i.e. (27). Figure 4 
illustrates the setting with y 3 denoting y 3 subtracted by 
the optimal estimate of e 3 . In the figure, the new param¬ 
eter 9 3 1 reflects that the relation between u\ and y 3 is 
different from 9 3 ,i as discussed above. A key insight from 
this discussion is that for the estimate of a parameter in 
the path from input i to output j, it is only outputs that 
are not affected by input i that can be used to estimate 
the noise in output j; when this particular parameter is 
estimated, using outputs influenced by input i will intro¬ 
duce a bias, since the noise estimate will then contain a 
term that is not orthogonal to this input. In (24), this 
manifests itself in that the numerator is Aj| Xfc _i, only the 
Xk — 1 first systems do not contain Ui. 



Fig. 3. The SIMO system of Remark 9, described by (26). 


We now turn our attention to the variance of the indi¬ 
vidual transfer function estimates. 


4 This since u{t) is white and B\ and B 2 are orthonormal. 

4 With u 1 and 112 correlated, the variance will be higher, see 
Section 4 for a further discussion of this topic. 



Fig. 4. The SIMO system of Remark 9, described by (26), 
but with 2/3 denoting y 3 subtracted by the optimal estimate 
of e 3 and # 3,1 reflects that the relation between ui and y 3 is 
different from @ 3 , 1 . 

Corollary 10 Let the same assumptions as in Theo¬ 
rem 7 hold. Then, for any frequency ojq , it holds that 

rii 

AsVar G z = |£ fe (e ^°)| 2 AsVar 9 hk , (28) 

k =1 

where 

AsVar 9 ik = , (29) 

£T Z 

and Aju is defined in (20). 


PROOF. Follows from Theorem 7, since (28) is a di¬ 
agonal element of (25). ■ 


From Corollary 10, we can tell when increasing the model 
order of Gj will increase the asymptotic variance of Gi. 

Corollary 11 Under the same conditions as in Theo¬ 
rem 7, if we increase the number of estimated parameters 
ofGj from nj to nj +1, the asymptotic variance ofGi will 
increase, if and only if all the following conditions hold: 

(1) nj < Hi, 

(2) efit) is not orthogonal to ej(t) conditioned on 

(3) |£„. + 1 (e^°)|V 0 . 


PROOF. See Appendix C. ■ 




































Remark 12 Corollary 11 explicitly tells when an in¬ 
crease in the model order of Gj from nj to Uj + 1 will 
increase the variance of Gi. If nj > nj, there will be no 
increase in the variance of Gi, no matter how many ad¬ 
ditional parameters we introduce to the model Gj, which 
was also seen the introductory example in Section 2.2. 
Naturally, if ei{t) is orthogonal to ejft) conditioned on 
e j\i(t), does not depend on ejft) and there is no 

increase in variance of Gi, cf. Remark 9. 

3.1 A graphical representation of Corollary 11 


If the model orders are distinct, the corresponding graph 
is given in Figure 5, where one can see that AsVar G 4 
depends on y 2 (and on 1/4 of course), but depends neither 
on 2/3 nor yi since 743 = 741 = 0, AsVar G 3 depends on 
2 /i, but not on y 2 since 732 = 0 and AsVar G 2 depends on 
2 /i, while the variance of Gi is given by (30). If n 2 = rn, 
the first condition of Corollary 11 is not satisfied and we 
have to cut the edge between G 4 and G 2 . Similarly, if 
n\ = n 2 , we have to cut the edge between Gi and Gi, 
and if additionally ni = ni = 713 , we have to cut the 
edge between G 3 and Gi- 


Following the notation in Bayesian Networks (Koski and 


Noble 2012 ), Conditions 1) and 2) in Corollary 11 can 
be" interpreted graphically. Each module is represented 
by a vertex in a weighted directed acyclic graph Q. Let 
the vertices be ordered by model order, i.e., let the first 
vertex correspond to Gi. Under Assumption 1, with the 
additional assumption that module i is the first module 
with order m , let there be an edge, denoted by j —► i, 
from vertex j to i, j < i, if ei(t) is not orthogonal to ej ( t ) 
conditioned on ej\i(t). Notice that this is equivalent to 
7 ij ^ 0. Let the weight of the edge be 7 ij and define 
the parents of vertex i to be all nodes with a link to 
vertex i, i.e., pag(i) := {j : j —> i}. Then, (29), together 
with Lemma 5, shows that only outputs corresponding to 
parents of node i affect the asymptotic variance. Indeed, 
a vertex without parents has variance 


A,, 


AsVar G,= ^ ^ |B fe (e^°)| 2 , 


(30) 


fc=1 



Fig. 5. Graphical representation of Conditions 1) and 2) in 
Corollary 11 for the Cholesky factor given in (31). 


4 Connection between MISO and SIMO 


There is a strong connection between the results pre- 
se nted here and those re garding MISO systems presented 


Ramazi et al. (20141. We briefly restate the problem 


formulation and some results from Ramazi et al. (2014) 


to show the connection. The MISO data generating sys¬ 
tem is in some sense the dual of the SIMO case. With 
m spatially correlated inputs and one output, a MISO 
system is described by 


which corresponds to (28) with 

Aj|o = • • ■ = K\i-i = A,. 

Thus, AsVar Gj is independent of the model order of the 
other modules. 


y(t) = 


Gi{q) G 2 (q) ... G m (q) 


ui(q) 

ui(q) 

u m (q). 


+ e(t). 


As an example, consider four systems with the lower 
Cholesky factor of the covariance of eft) given by: 


1 0.4 0.2 

0 


0.4 1.16 0.08 

0.3 


0.2 0.08 1.04 

0 


0 0.3 0 


1.09 


1 0 0 0 


1 

0 0 0 

0.4 1 0 0 


0.4 

1 0 0 

0.2 0 1 0 


0.2 

0 1 0 

0 0.3 0 1 


0 0.3 0 1 


Ach 


The input sequence {u(t)} is zero mean and temporally 
white, but may be correlated in the spatial domain, 

E [w(f)] = 0 

E [uft)u(s) T ] = Stsllu, 

for some positive definite matrix covariance matrix E u = 
EchSIq H , where Ech is the lower triangular Cholesky 
factor of E. The noise eft) is zero mean and has variance 
A. The asymptotic covariance of the estimated parame¬ 
ters can be expressed using ( 11 ) with 

V = <F miso := I'E C h- (32) 

We make the convention that Y^k=k 1 x k = 0 whenever 
ki > k 2 . 
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Theorem 13 (Theorem 4 in Ramazi et al. ( 2014| )) 

Under Assumption 1, but with > ri 2 > ..., > n m , for 
any frequency ujq it holds that 


AsVar G t = ^ A E I s * ( e ^°) I 


jwo \ 12 


(33) 


7 —i *U /c=n, + i4-l 


MISO systems. Let the first yy systems contain basis 
function k , so 


AsVar §jf TTSO = X Nf. l Xk 

where A 1:Xfe denotes the covariance matrix of the first 
Xk inputs. Hence 


where n m+ \ := 0 and cr^ is the variance of the non- 
estimable part ofui(t) given Uj\i(t). 


Corollary 14 (Corollary 6 in Ramazi et al. (2014)) 

Under Assumption 1, but with n± > n 2 > . ..,> n m . 
Suppose that the order of block j is increased from nj to 
Uj. |_i. Then there is an increase in the asymptotic vari¬ 
ance of Gi if and only if all the following conditions hold: 


AsCovG = A ^ 


k =1 


y- 1 

-Xk 


Om—Xk. 


\B k (e jU0 )\ 2 , 


and 

AsCov § MISO = A diag(r i ^ 1 , £f.x 2 , • • •, ) • ( 36 ) 


(1) Uj < Hi, 

(2) Ui(t) is not orthogonal to Uj(t) conditioned on 

Uj\i{t), 

(3) \B nj+1 (e J “°)| 2 7 ^ 0 . 

Remark 15 The similarities between Corollary 10 and 
Theorem 13, and between Corollary 11 and Corollary If 
are striking. In both cases it is the non-estimable part of 
the input and noise, respectively, along with the estimated 
basis functions that are the key determinants for the re¬ 
sulting accuracy. Just as in Corollary 10, Theorem 13 
can be expressed with respect to the basis functions: 

rii 

AsVar 4 = E AsVar \Bk(e ju>0 )\ 2 . (34) 

k =1 

However, now 

AsVar = 2 (35) 

a i\xk 


Note that, while the correlation between the noise 
sources is beneficial, the correlation in the input is detri¬ 
mental for the estimation accuracy. Intuitively, if we use 
the same input to parallel systems, and only observe the 
sum of the outputs, there is no way to determine the 
contribution from the individual systems. On the other 
hand, as observed in the example in Section 2.2, if the 
noise is correlated, we can construct equations with re¬ 
duced noise and improve the accuracy of our estimates. 

This difference may also be understood from the struc¬ 
ture of T, which through ( 11 ) determines the variance 
properties of any estimate. Consider a single SISO sys¬ 
tem G\ as the basic case. For the SIMO structure con¬ 
sidered in this paper, as noted before, >F SIMO of ( 12 ) is 
block upper triangular with m columns (the number of 
outputs), while !F MISO is block lower triangular with as 
many columns as inputs. <F MISO is block lower triangu¬ 
lar since T is block diagonal and £ch is lower triangu¬ 
lar in (32). Adding an output yj to the SIMO structure 
corresponds to extending \p slMO with one column (and 
Uj rows): 


where is determined by the correlation structure of the 
inputs Ui{t ) to the systems Gi(q,0i) that do share basis 
function Bu(q) (i = 1,... ,Xk)- Note that in the SIMO 
case we had 


, 7 /SIMO 


^SIMO * 
0 * 


(37) 


AsVar Oi k = 


'»! Xk 


where \\ Xk is determined by the correlation structure of 
the noise sources e*(f) affecting systems Gi(q,0i ) that 


do not share basis function B\,{q ) fi = 1,..., Xk)- Note 

( 7 ^MISO _ 

>MISO o' 

that (33) found in 

Ramazi et al. 

( 2014 \) does not draw 

the connection to the variance of the parameters. This is 


-k k 


where the zero comes from that if ,3IMO also is block up¬ 
per triangular. Since !F MISO is block lower triangular, 
adding an input Uj to the MISO structure extends <F MISO 
with nj rows (and one column): 


(38) 


made explicit in the alternate expressions (35) and (34). 


The correlation between paramet ers related to the sam e 
basis functions is not explored in Ramazi et al. (20141. 
In fact, it is possible to follow the same line of reasoning 
leading to Theorem 7 and arrive at the counter-part for 


where * denotes the added column and added row re¬ 
spectively. Addition of columns to decreases the vari¬ 
ance of G \, while addition of rows increases the variance. 
First, a short motivation of this will be given. Second, 
we will discuss the implication for the variance analysis. 
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Addition of one more column to P in (37) 


the variance of G\. With P 



decreases 

= 


™ (Yk^k), where (V’fe^fe) > 0 for every k. The vari¬ 
ance of the parameter estimate 9n decreases with the 
addition of a column, since 


i*e,*e) 1 < \{lpm+l,1p m +l) + (V, &)] 1 • 


On the other hand, addition of rows leads to an increase 
in variance of G\, e.g., consider (18) in Lemma 3, 

V 

AsCo vG 1 (9 N ) = L T Y / ^(z 0 )*B s k (z 0 )L, 
fc =1 


where L = U, 


-T 

CH 


1 0 ... 0 


for any number of inputs, 


and {B s k Y k=1 is a basis for the linear span of the rows of 
ij-miso can geen f rom (38), the first rows of ip-Miso 
are the same as for i^ MISO and the first r basis functions 
can therefore be taken the same (with a zero in the last 
column). To accommodate for the extra rows, n e extra 
basis functions {B k are needed. Thus, {B k 

is a basis for the linear span of py lso . We see that the 
variance of G\{9 e N ) is larger than G\(9n) since 


AsCov G !((>%) = AsCov G^On) 

r+n e 

+L t £ B s k (z 0 )*B s k (z 0 )L, 

k=r+1 


respectively. When only y\ is used (P = q l ) : 

AsVar 0 M = {P,P)~ X = 1. 

When ri 2 = 0, the second measurement gives a benefit 
determined by how strong the correlation is between the 
two noise sources: 

AsVar 0 M = {P,P)~ l = (1 + (1 - /S 2 )//? 2 )" 1 = /3 2 . 

However, already if we have to estimate one parameter 
in G 2 the benefit vanishes completely, i.e., forn 2 = 1: 


AsCov 9 = (P, P) 1 


1 

py^w 1 


The third case, ri 2 = 2, corresponds to the example in 
Section 2.2, which shows that the first measurement y\ 
improves the estimate of 9-2,2 (compared to only estimat¬ 
ing G 2 using y 2 ): 

AsVar 9 2>2 = A 2 |i = /3 2 . 

Example 17 We consider the corresponding MISO sys¬ 
tem with unit variance noise eft) and uft) instead having 
the same spectral factor 


£CH 


1 0 
A fY^W 


and L T Bf (z 0 )*Biy 0 )L > 0 is positive semidefinite for 
every k. 

Every additional input of the MISO system corresponds 
to addition of rows to P. The increase is strictly posi¬ 
tive provided that the explicit conditions in Corollary 14 
hold. 

Every additional output of the SIMO system corre¬ 
sponds to the addition of one more column to P. How¬ 
ever, the benefit of the additional columns is reduced by 
the additional rows arising from the additional parame¬ 
ters that need to be estimated, cf. Corollary 11 and the 
preceding discussion. When the number of additional 
parameters has reached n\ or if e± ft) is orthogonal to 
Bj(t) conditioned on Uj\ift) the benefit vanishes com¬ 
pletely. The following examples clarify this point. 


for /? £ (0,1]. We now study the impact of the second 
input U 2 (t) when 


n 2 

Gi{q) = 9\,\q 1 + 9\,2q 2 , G 2 {q) = ^ 2 ,k.q k 

fc =1 

and n 2 =0,1. These two cases correspond to P given by 
the first n 2 + 2 rows of 


yfrMISO 
^e 


(<?) 


-,-1 


V 1 - Pq 


2/7—1 


0 

0 

Pq~ 


respectively. When only U\ is used or G 2 is known (P = 
r 1 T 



Example 16 We consider the same example as in Sec¬ 
tion 2.2 for three cases of model orders of the second 
model, U 2 = 0,1, 2. These cases correspond to P given by 
the first ri 2 + 1 rows of 


AsCov 9i = (P, P) 1 = I. 

Whenn 2 = 1, the variance of0\p is increased depending 
on the correlation between the two inputs: 


P e SIMO (q) 


q 1 -VY^Gp/fiq 1 
0 1 Ifiq - 1 

0 1 /fiq ~ 2 


AsVar 9 = 


1 

P 


1 0 -PY^W 

0 (3 2 0 

0 1 
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Also notice that the asymptotic covariance of cq 1 a 2 ,i 

is given by 17 , the inverse of the covariance matrix of 

u(t) and that AsVar 6/2 = erf . As f3 goes to zero the 


variance of 


$1,1 $2,1 


increases and at /3 = 0 the two 


inputs are perfectly correlated and we loose identifiability. 


5 Effect of input spectrum 

In this section we will see how a non white input spec¬ 
trum changes the results of Section 3. Using white noise 
filtered through an AR-filter as input, we may use the 
developed results to show where in the frequency range 
the benefit of the correlation structure is focused. An 
alternative approach, when a non-white input spectrum 
is used, is to change the basis functions as discussed in 
Remark 2. However, the effect of the input filter is in¬ 
ternalized in the basis functions and it is hard to distin¬ 
guish the effect of the input filter. We will instead use 
FIR basis functions for the SIMO system, which are not 
orthogonal with respect to the inner product induced by 
the input spectrum, cf. Remark 2. We let the input filter 
be given by 


«(*) = ( 39 ) 

where w(t) is a white noise sequence with variance of 
and the order n a of A is less than the order of G\, 
i.e., n a < n;i. In this case, following the derivations of 
Theorem 7, it can be shown that 

rii 

AsVar & = J2 AsVar e hk \B k {e jui0 )\ 2 (40) 

fc=i 


where 


AsVar 9 h k 


^IXfc -1 


and the basis functions Bk have changed due to the in¬ 
put filter. The solutions boil down to finding explicit ba¬ 


sis functions Bk for the well known case (Ninness and 


Gustafsson 1997) when 


where {Bk} are the Takenaka-Malmquist functions given 

by 


Jl - I£J 2 

B k {q) ■■= - 7 - <t> k -i{q), k=l,...,n 

q-t,k 

A{q) = II 1 <t>o{q) ■= i 

»=i q ^ 

and with = 0 for k = n a + 1,..., n. We summarize 
the result in the following theorem: 


Theorem 18 Let the same assumptions as in Theorem 1 
hold. Additionally the input u(t ) is generated by an AR- 
filter as in (39). Then for any frequency ujq it holds that 


AsVar Gi = 


1 

$u{u o) 



i-ia-i 2 

|eJ“° -f k | 2 


+ Aj(ni 


A y n j- 

i=2 



n a ) 


(41) 


PROOF. The proof follows from (40) with the basis 
functions given by the Takenaka-Malmquist functions 
and using that <j>k-i(.q) is all-pass, and for k > n a , also 
Bk(q) is all-pass. This means that |£>fc(e-fo)| 2 = 1 for all 
k > n a - ■ 

Remark 19 The second sum in (41) is where the ben¬ 
efit from the correlation structure of the noise at the 
other sensors enters through This contribution 

is weighted by 1 /<& u (ui). The benefit thus enters mainly 
where T> u {lo) is small. The first sum gives a variance con¬ 
tribution that is less focused around frequencies close to 
the poles (\e^ Uo — £fc| 2 is canceled by <3> u (ui)). This contri¬ 
bution is not reduced by correlation between noise sources. 
Shaping the input thus gives a weaker effect on the asymp¬ 
totic variance than what is suggested by the asymptotic 
in model order result (1), and what would be the case if 
there would be no correlation between noise sources (re¬ 
placing \i\j-i by Xi in (41)/ 

For the example in Section 2.2 for filtered input with 
ni = 2 , 772 = 3 and n a = 1, (41) simplifies to 


Span 



Span 


q- 1 q~ 2 q~ n \ 
A{q) ’ A(q) ’ ‘ ’ A(q) J 


AsVar G 2 


A2 A2 A 2 |l 

a 2 @ u {o] 0 ) <P u (u) 0 )' 


(42) 


where A{q) = ~ &V -1 / |£k| < 1 for some set of 

specifi ed poles ..., ) and where n > n a . Then, it 

holds (Ninness and Gustafsson, 1997) that 


Span 



Span {Bi,.. .,B n } 


In Figure 6 the variance of G 2 for an input filtered with 
A{q) = 1 — O.Sg ” 1 is presented. The filter is of low-pass 
type and thus gives high input power at low frequen¬ 
cies which results in low variance at those frequencies. 
Correlation between noise sources decreases the variance 
mainly where ^ u (w) is small, i.e., at higher frequencies. 
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_I_ 

0 7r/2 


ui 

Fig. 6. Asymptotic variance of , 62 ) for /3 = 1 

and /3 = 0.4. Also shown is A 2 /a, the first term of 
AsVar G 3 (e^,02) in (42). 

6 Optimal correlation structure 

In this section we characterize the optimal correlation 
structure, in order to minimize the total variance. In the 
previous section, we have seen that not estimating a basis 
function k leads to a reduction in variance of the other 
parameters related to basis function k. In this section, 
we will find the optimal correlation structure when the 
parameterization of the system is such that n\ + 1 = 
«2 = ... = n m , i.e., the first module has one parameter 
less than the others. Let 9 := [0 2 ,ra 2 i ^ 3 ,Ti 2 > • • • % 9m,n 2 ] T , 
i.e., the sub vector of parameters related to basis function 
B n2 ( q ). Assume the SIMO structure given by (2) and let 
the input be white noise with unit variance. Recalling 
Theorem 7, the covariance matrix of 9 is given by 1 - 
In particular, the variance of the entries of 9 is 

AsVar 9k n 2 = , k = 2,..., to. (43) 

<r z 

As before, A^p is the non-estimable part of e&(f) given 

ei(t). 


noise structure using this formulation of the problem 
appears to be hard. Therefore, it turns out convenient 
to introduce an upper triangular Cholesky factorization 
of A, namely define B upper triangular such that A = 
BB T . Note that 


(1) the rows of B, bf , i = 1, ..., to, are such that 

INI 2 = A i; 

(2) E{eiei} = A u = bfb.p, 

(3) there always exists an orthonormal matrix Q such 
that B = AqhQ, where Ach is the previously de¬ 
fined lower triangular Cholesky factor. 

Lemma 20 Let 


B = 


T 

9 P 
0 M 


M G Rm-lXTO-l p g R ™-lXl 


then 

^ 2 :m|i = M(I — ~^)pp T M T . (45) 

Ai 


PROOF. See Appendix D. 


Using the previous lemma we reformulate (44); keeping 
M fixed and letting p vary, we have 

maximize Tr —Mpp T M T 
61 Ai 

s.t. || 6 i|| 2 =Ai, (46) 

with bf = [rj p 1 ]. Note that the constraint A > 0 is 
automatically satisfied. Let us define v±, v m -i as 
the right-singular vectors of M, namely the columns of 
the matrix V in the singular value decomposition (SVD) 
M = USV T . The following result provides the structure 
of B that solves (44). 


Recalling Theorem 7, the covariance matrix of 9 is given 
by ~2 A 2:m |2 ■ The total variance of 9 is defined as 

m ^ 

tvar 6 := ^ AsVar § ifn2 = Tr — A 2: m\i • 
i =2 a 

We are interested in understanding how the correlation 
of ei(£) with e 2 (t), ..., e m (£), i.e., A 12 ..., A lm , should 
be in order to obtain the minimum value of the above 
total variance. This problem can be expressed as follows: 

minimize Tr A 2 - m \ 1 

A 12 ...,A im 

s.t. A > 0, (44) 

where the constraint on A implies that not all choices of 
the entries An are allowed. Directly characterizing the 


Theorem 21 Let the input be white noise. Let n\ + 1 = 
n 2 = ... = n m . Then (44) is solved by an upper triangu¬ 
lar Cholesky factor B such that its first row is 

bf = Ar[0 vj}. (47) 

PROOF. Observe the following facts: 

(1) Tr -^Mpp T M t = A-p T M T Mp = A-||Mp|| 2 ; 

( 2 ) since bf = [77 p T ], it is clear that a candidate so¬ 
lution to (46) is of the form b\ T = [0 p * T ]. 

It follows that Problem (46) can be written as 

maximize ||Mp || 2 

b 1 

s.t. ||p || 2 = Ai, (48) 
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whose solution is known to be (a rescaling of) the first 
right singular vector of M, namely v\. Hence 6 3 T = 

Ai[o v f]. m 


Remark 22 As has been pointed out in Section 2, A 
is required to be positive definite. Thus, the ideal solu¬ 
tion provided by Theorem 21 is not applicable in practice, 
where one should expect thatq, the first entry ofbi, is al¬ 
ways nonzero. In this case, the result of Theorem 21 can 

be easily adapted, leading to 6 J T = p J 1 — 



1 

0.75 
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7 Numerical examples 

In this section, we illustrate the results derived in the 
previous sections through three sets of numerical Monte 
Carlo simulations. 


7 .1 Effect of model order 


Corollary 11 is illustrated in Figure 7, where the follow¬ 
ing systems are identified using N = 500 input-output 
measurements: 


G i = r i 9 i , i = 1,2,3 


I'M) 




I'M) = F (q)- 1 Ti(q), 
B k (q) = q~\ (49) 


Fig. 7. Sample variance of G 3 (e J “ ,6f) as a function of the 
number of estimated parameters of Gi and G 2 . 

After that, any increase in n\ or n 2 does not increase the 
variance of G 3 (e JW , 0 3 ), as can be seen in Figure 7. The 
behavior can be explained by Corollary 11: when n 3 > 
n\,n 2 , G 3 is the last block, having the highest number 
of parameters, and any increase in n\,n 2 increases the 
variance of G 3 . When for example n\ > n 3 , the blocks 
should be reordered so that G 3 comes before Gi. In this 
case, when ni increases the first condition of Corollary 
11 does not hold and hence the variance of C 3 {e^ ,0 3 ) 
does not increase further. 

7.2 Effect of removing one parameter 


with 


F(Q) = 


1 


1 - 0.8g -11 

1 T 

1-12 ■ 


0°i = 




1 0.5 0.7 


112 11 


The input u(t) is drawn from a Gaussian distribution 
with variance a 2 = 1, filtered by F(q). The measurement 
noise is normally distributed with covariance matrix A = 
AchA^u, where 


A C h 


1 0 0 
0.6 0.8 0 
0.7 0.7 0.1 


thus Ai = A 2 = A 3 = 1. The sample variance is com¬ 
puted using 

1 MC 

CoV ^=Mg£ |G3(eJ “ 0 ’ 0 °) - G 3 (e?“°,0 3 ) I 2 , 

0 fc =1 


In the second set of numerical experiments, we simulate 
the same type of system as in Section 2.2. We let f3 
vary in the interval [0.001, 1]. For each fi, we generate 
MC = 2000 Monte Carlo experiments, where in each of 
them we collect N = 500 input-output samples. At the 
*-th Monte Carlo run, we generate new trajectories for 
the input and the noise and we compute 0i as in (9). The 
sample covariance matrix, for each /?, is computed as 

1 MC 

^-^-ew-er. 

i—1 

Figure 8 shows that the variance of 0 2 \ is always close 
to one, no matter what the value of j3 is. It also shows 
that the variance of the estimate 0 22 behaves as f3 2 . In 
particular, when (3 approaches 0 (i.e., almost perfectly 
correlated noise processes), the variance of such estimate 
tends to 0. All of the observations are as predicted by 
Corollary 7 and the example in Section 2.2. 

7.3 Optimal correlation structure 


where MC = 2000 is the number of realizations of the 
input and noise. The same realizations of the input and 
noise are used for all model orders. 


A third simulation experiment is performed in order to 
illustrate Theorem 21. We consider a system with m = 3 
outputs; the modules are 


The variance of G 3 (e J “, 0 3 ) increases with increasing m, Gi(q) — 0.3q \ G- 2 (q)—0.8q 1 — OAq 2 , 

i = 1,2, but only up to the point where n.j = n 3 = 5. G 3 (g) = O.I 5 -1 + 0.2<G 2 , 
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p 

Fig. 8. Sample variance of the parameters of the first module 
(as functions of /3) when the second module has one param¬ 
eter. 

so that 

0 = [On 021 031 022 0 32 ] T = [0.3 0.8 0.1 - 0.4 0.2 ] t . 

The noise process is generated by the following upper 
triangular Cholesky factor: 


e Vl — e 2 cos a \J\ — e 2 sin a 



0 0.8 0.6 

= 

£ p 



0 M 

O 

O 


L 


where e = 0.1 and a £ [0, 7 r] is a parameter tuning the 
correlation of ei(t) with e-i (t) and 63 (f). The purpose 
of this experiment is to show that, when a is such that 
p = [vT^e^cosa vT^e 2 sina] T is aligned with the 
first right-singular vector Vi of M, then the total variance 
of the estimate of the sub-vector 0 = [On 022] T is mini¬ 
mized. In the case under analysis, v\ = [0.447 0.894 ] T = 
[cosao sinao] T , with a 0 = 1.11. We let a take values 
in [0, 7 r] and for each a we generate MC = 2000 Monte 
Carlo runs of IV = 500 input-output s_amples each. We 
compute the sample total variance of 0 as 

. MC 

tvar 0 S = ^ (( 022 ,i — O22) 2 + (032,* — 0 3 2 ) 2 ^ ) 

i—1 

where 0 224 and 632,1 are the estimates obtained at the 
*-th Monte Carlo run. 

The results of the experiment are reported in Figure 9. 
It can be seen that the minimum total variance of the 
estimate of 0 is attained for values close to ao (approx¬ 
imations are due to low resolution of the grid of values 
of a). An interesting observation regards the value of a 
for which the total variance is maximized: this happens 
when a = 2 . 68 , which yields the second right-singular 
vector of M, namely V 2 = [—0.894 0.447] T . 



Fig. 9. Total sample variance of the parameter vector 8 as 
function of a. 

8 Conclusions 

The purpose of this paper has been to examine how the 
estimation accuracy of a linear SIMO model depends on 
the correlation structure of the noise and model struc¬ 
ture and model order. A formula for the asymptotic 
covariance of the frequency response function estimate 
and the model parameters has been developed for the 
case of temporally white, but possibly spatially corre¬ 
lated additive noise. It has been shown that the vari¬ 
ance decreases when parts of the noise can be linearly 
estimated from measurements of other blocks with less 
estimated parameters. The expressions reveal how the 
order of the different blocks and the correlation of the 
noise affect the variance of one block. In particular, it 
has been shown that the variance of the block of interest 
levels off when the number of estimated parameters in 
another block reaches the number of estimated param¬ 
eters of the block of interest. The optimal correlation 
structure for the noise was determined for the case when 
one block has one parameter less than the other blocks. 
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B Proof of Theorem 10 

Before giving the proof of Theorem 10 we need the fol¬ 
lowing auxiliary lemma. 

Lemma 23 Let A > 0 and real and its Cholesky factor 
Ach be partitioned according to e\-. Xk -i and e Xk:m , 


A = 

1 

(N 

1_ 

, A ch = 

(A C h)i 0 


_Y 2 i Y 2 _ 


(Ach) 21 (Ach) 2 _ 


Then 


A Xk'.m\Xk-l = ( A Ch)2(A C h)2 ■ 


PROOF. By the derivations of Lemma A, for some 
v(t) with Cov vft) = /, e(t) = AcHvft) and V\ :x -\{t) 
are known since ei, x _i ft) are known. Furthermore 
^XkMxk-i^) = ( A CH) 2 iVi: X -i(t ), which implies 

e Xk'-m\xk-l(t ) — = ( A CH)2 v Xk-m{t ) an d 

the results follows since Cov v Xk:m (t) = I. ■ 

The asymptotic variance is given by (11) with 


A Proof of Lemma 5 


Let vft) = A c left) for some real valued random var i¬ 
abl e e(t) (A f}„ exis ts and is unique for A > 0 (Horn 


and Johnson 1990)). Then Cov vft) = I. Similarly 
eft) = Achv\JT- The set (iq(t),..., Vjft)} is a func¬ 
tion of ei (t ),..., &j ft) only and vice versa, for all 
1 < j < m. Thus, if e\(t),. .., Bjff) are known, then 
also {ui(t),..., Vj(t)} are known, but nothing is known 
about {vj + i(t),... ,v m (t)}. Thus, for j < i the best 
linear estimator of ei(t) given ei(£),..., ejff), is 




(A.l) 


k=1 


and 


Sift) - e*| j(t) 


Hq) = Hq)A c T H- 

Let n = ni H— ■ + n m . From the upper triangular struc¬ 
ture of Aff 1 ^ and ni < n 2 < ... < n m , an orthonormal 
basis for <S^, the subspace spanned by the rows of T, is 
given by 


B s k {e j “) := 
B s k (en := 

B s k (en := 


B k 0 ... 0 
0 B k - ni 0 ... 


...0 6 , 


k—n-\-n n 


k= 1 ,..., ni 
k=ni + 1 ,... ,n 2 

(B-1) 

k=n — n m + l,..., n. 


First note that 


dG 

1)6 

Then, using Theorem 3, 


= T>A, 


CH- 


has variance 


l 

•^li = Tik- 

k=j +1 


For the last part of the lemma, we realize that the de¬ 
pendence of e-i\j{t) on ejff) in Equation (A.l) is given by 
Tij/ljj (since v\ (t), ..., Vj-iff) do not depend on ejft)). 
Hence e^jft) depends on ejff) if and only if 7 ^ 7 ^ 0. 


a 2 AsVar G = A CH J2 B s k (e^)*B s k (^)A T CH . 

k =1 

Sorting the sum with respect to the basis functions 
£? fc (e J “°), we get 


a 2 AsCov Gi = A ch ^ \B k (C“°)\ 2 

k =1 


®Xk~ 1 
0 


0 

I 


A 


T 

CH- 
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Using Lemma 23 


so that 


AsCov G = — ^ 


k —1 


Ov,- 


0 


Xfc-1 
0 ^Xk'-m\xk-l. 


\B k (e^°)\ 2 . 


T = 

I(k) = 


Bil( 1) B 2 I( 2 )... B nm I(n m ) 


/m-Xfc+l 


jmx (m-Xfe+l) 


/l - 1 = B^B- 1 = 


V 


T — 1 

-Q V 


—qrj 1 M 1 M 1 + qq 1 


Thus (25) follows. We now show the first part of the 
theorem, that 

AsCov 0 —^ diag(A 1:m , A^ 2 . m |^. 2 _|,..., ^ixn^ —i)- 

The covariance of G can be expressed as 

AsCov G = TAsCov d T* (B.2) 

where 


Hence, using the Sherman-Morrison formula (Bartlett 


1951) 


where 


^ 2 :m|i = (M t M 1 +qq T ) 


-1 


= MM T - -MM T qq T MM T 
k 


= MM 1 - - 


T 1 Mpp T M T 


T 

VP 


k = l + q 1 MM 1 q = 1 + 


rr+p T p ||&i| 


rj 




?7 Z 


77 z 


However, (B.2) equals (25) for all w and the theorem 
follows. 


so (45) follows. 


C Proof of Corollary 11 

To prove Corollary 11 we will use (28). First, we make 
the assumption that j is the last module that has rij 
parameters. This assumption is made for convenience 
since reordering all modules with rij estimated parame¬ 
ters does not change (28). First of all, we see that if 

rij > rij, 

then (28) does not increase when rij increases. If instead 
rij < m, 

the increase in variance is given by 

which is non-zero iff 7 ij 7 ^ 0 and |£> nj ._|_i(e J “ 0 )| 2 7 ^ 0 . 
From Lemma 5 the theorem follows. 


D Proof of Lemma 20 


The inverse of B is 


B- 1 = 


77 -q 

0 M- 1 


, q T := 7 r 1 p T M~ l 
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